Nuclear norm minimization for the planted clique and 

biclique problems* 

Brendan P.W. Ames^ Stephen A. Vavasis* 
January 21, 2009 

O " 

Abstract 

| We consider the problems of finding a maximum clique in a graph and finding a 

maximum-edge biclique in a bipartite graph. Both problems are NP-hard. We write 
• both problems as matrix-rank minimization and then relax them using the nuclear 

Q \ norm. This technique, which may be regarded as a generalization of compressive 

sensing, has recently been shown to be an effective way to solve rank optimization 
problems. In the special cases that the input graph has a planted clique or biclique 
(i.e., a single large clique or biclique plus diversionary edges), our algorithm successfully 
provides an exact solution to the original instance. For each problem, we provide 
two analyses of when our algorithm succeeds. In the first analysis, the diversionary 
edges are placed by an adversary. In the second, they are placed at random. In the 
case of ranc 
Alon, Krive 
techniques. 
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case of random edges for the planted clique problem, we obtain the same bound as 
Alon, Krivelevich and Sudakov as well as Feige and Krauthgamer, but we use different 
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1 Introduction 



Several recent papers including Recht et al. [T7] and Candes and Recht [I] consider nuclear 
norm minimization as a convex relaxation of matrix rank minimization. Matrix rank mini- 
mization refers to the problem of finding a matrix X e j^ mxn to minimize rank (X) subject 
to linear constraints on X. As we shall show in Sections |3] and HJ the clique and biclique 
problems, both NP-hard, are easily expressed as matrix rank minimization, thus showing 
that matrix rank minimization is also NP-hard. 

Each of the two papers mentioned in the previous paragraph has results of the following 
general form. Suppose an instance of matrix rank minimization is posed in which it is known 
a priori that a solution of very low rank exists. Suppose further that the constraints are 
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random in some sense. Then the nuclear norm relaxation turns out to be exact, i.e., it 
recovers the (unique) solution of low rank. The nuclear norm of a matrix X, also called the 
trace norm, is defined to be the sum of the singular values of X. 

These authors build upon recent breakthroughs in compressive sensing [TOl [5], In 
compressive sensing, the problem is to recover a sparse vector that solves a set of linear 
equations. In the case that the equations are randomized and a very sparse solution exists, 
compressive sensing can be solved by relaxation to the l\ norm. The correspondence between 
matrix rank minimization and compressive sensing is as follows: matrix rank (number of 
nonzero singular values) corresponds to vector sparsity (number of nonzero entries) and 
nuclear norm corresponds to l\ norm. 

Our results follow the spirit of Recht et al. but use different technical approaches. We es- 
tablish results about two well known graph theoretic problems, namely maximum clique and 
maximum-edge biclique. The maximum clique problem takes as input an undirected graph 
and asks for the largest clique (i.e., induced subgraph of nodes that are completely intercon- 
nected). This problem is one of Karp's original NP-hard problems j8]. The maximum-edge 
biclique takes as input a bipartite graph (U, V, E) and asks for the subgraph that is a com- 
plete bipartite graph K m>n that maximizes the product mn. This problem was shown to be 
NP-hard by Peeters [16J. 

In Sections [3] and HI we relax these problems to convex optimization using the nuclear 
norm. For each problem, we show that convex optimization can recover the exact solution in 
two cases. The first case, described in Section |3~2| is the adversarial case: the iV-node graph 
under consideration consists of a single n-node clique plus a number of diversionary edges 
chosen by an adversary. We show that the algorithm can tolerate up to 0(n 2 ) diversionary 
edges provided that no non-clique vertex is adjacent to more than 0{n) clique vertices. We 
argue also that these two bounds, 0(n 2 ) and 0(n), are the best possible. We show analogous 
results for the biclique problem in Section 14.11 

Our second analysis, described in Sections 13.31 and H~2| supposes that the graph contains 
a single clique or biclique, while the remaining nonclique edges are inserted independently 
at random with fixed probability p. This problem has been studied by Alon et al. [2] and 
by Feige and Krauthgamer [6]. In the case of clique, we obtain the same result as they do, 
namely, that as long as the clique has at least 0(iV 1//2 ) nodes, where N is the number of 
nodes in G, then our algorithm will find it. Like Feige and Krauthgamer, our algorithm 
also certifies that the maximum clique has been found due to a uniqueness result for convex 
optimization, which we present in Section I3TT1 We believe that our technique is more general 
than Feige and Krauthgamer; for example, ours extends essentially without alteration to 
the biclique problem, whereas Feige and Krauthgamer rely on some special properties of the 
clique problem. Furthemore, Feige and Krauthgamer use more sophisticated probabilistic 
tools (martingales), whereas our results use only Chernoff bounds and classical theorems 
about the norms of random matrices. The random matrix results needed for our main 
theorems are presented in Section [2j 

Our interest in the planted clique and biclique problems arises from applications in data 
mining. In data mining, one seeks a pattern hidden in an apparently unstructured set of 
data. A natural question to ask is whether a data mining algorithm is able to find the hidden 
pattern in the case that it is actually present but obscured by noise. For example, in the 
realm of clustering, Ben-David pQ has shown that if the data is actually clustered, then a 
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clustering algorithm can find the clusters. The clique and biclique problems are both simple 
model problems for data mining. For example, Pardalos [T3] reduces a data mining problem 
in epilepsy prediction to a maximum clique problem. Gillis and Glineur [TT] use the biclique 
problem as a model problem for nonnegative matrix factorization and finding features in 
images. 

2 Results on norms of random matrices 

In this section we provide a few results concerning random matrices with independently 
identically distributed (i.i.d.) entries of mean 0. In particular, the probability distribution 
Q for an entry Aij will be as follows: 

. | 1 with probability p, 

lJ 1 — p/(l — p) with probability 1 — p. 

It is easy to check that the variance of is a 2 = p/(l — p). 
We start by recalling a theorem of Fiiredi and Komlos [7j: 

Theorem 2.1 For all integers 1 < j < i < n, let be distributed according to fl. 
Define symmetrically A^ = Aji for all i < j . 

Then the random symmetric matrix A = [A^] satisfies 

\\A\\ < 3d Vn 

with probability at least to 1 — exp(— cn 1 / 6 ) for some c > that depends on a. 

Remark 1. In this theorem and for the rest of the paper, ||v4|| denotes ||^4||2 ; often called 
the spectral norm. It is equal to the maximum singular value of A or equivalently to the 
square root of the maximum eigenvalue of A 7 A. 

Remark 2. The theorem is not stated exactly in this way in [7j; the stated form of the 
theorem can be deduced by taking k = (a j ' K) l ^n 1 ^ and v = cry/n in the inequality 

P(max |A| > 2o\fn + v) < y/nexp(—kv/ (2\/n + v)) 

on p. 237. 

Remark 3. As mentioned above, the mean value of entries of A is 0. This is crucial for the 
theorem; a distribution with any other mean value would lead to \\A\\ = 0(n). 
A similar theorem due to Geman [S] is available for unsymmetric matrices. 

Theorem 2.2 Let A be a \yn~\ x n matrix whose entries are chosen according to Q for fixed 
y G R + . Then, with probability at least 1 — C\ exp(— c 2 n c:i ) where c\ > 0, c 2 > 0, and c 3 > 
depend on p and y, 

\\A\\ <c^ 

for some C4 > also depending on p,y. 
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As in the case of [7], this theorem is not stated exactly this way in Geman's paper, 
but can be deduced from the equations on pp. 255-256 by taking k = n q for a q satisfying 
(2a + A)q < 1. 

The last theorem about random matrices requires a version of the well known Chernoff 
bounds, which is as follows (see [T5| Theorem 4.4]). 

Theorem 2.3 (Chernoff Bounds) Let X\, . . . , Xk be a sequence ofk independent Bernoulli 
trials, each succeeding with probability p so that E(Xi) = p. Let S = Yli=i-^i ^ e the binomi- 
ally distributed variable describing the total number of successes. Then for 5 > 

It follows that for all a G (0,p^/k), 

P(\S-pk\ > aVk) < 2exp(-a 2 /p). (2) 

The final theorem of this section is as follows. 

Theorem 2.4 Let A be an n x N matrix whose entries are chosen according to Q. Let A 
be defined as follows. For (i,j) such that Aij = 1, we define A^ = 1. For entries (i,j) such 
that A^ = —pi (1 — p), we take A^ = —rij/ (n — rij), where nj is the number of 1 's in column 
j of A. Then there exist c\ > and c<i e (0, 1) depending on p such that 

P{\\A - A\\ 2 F < Cl N) > 1 - (2/3)" - iVc™. (3) 



Remark 1. The notation \\A\\f denotes the Frobenius norm of A, that is, (X^X^ ^fj) 
It is well known that \\A\\f > ||^4|| for any A. 

Remark 2. Note that A is undefined if there is a j such that rij = n. In this case we assume 
that \\A — A|| = oo, i.e., the event considered in fails. 

Remark 3. Observe that the column sums of A are random variables with mean zero since 
the mean of the entries is 0. On the other hand, the column sums of A are identically zero 
deterministically; this is the rationale for the choice of A = —Ujj (n — rij). 
Proof: From the definition of A, for column j, there are exactly n — rij entries of A that 
differ from those of A. Furthermore, the difference of these entries is exactly (rij —pn) /((l — 
p)(n — rij)). Therefore, for each j = 1, . . . , N, the contribution of column j to the square 
norm difference \\A — A\\ F is given by 



\\A(:,j)-A(:,j) 



i2 = (jh ~ vn > 

(1 — p) 2 (n — rij] 



Recall that the numbers ri\, . . . are independent, and each is the result of n Bernoulli 
trials done with probability p. 

We now define ^/ to be the event that at least one rij is very far from the mean. In 
particular, \1/ is the event that there exists a j £ {!,..., iV} such that rij > qn, where 
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q = min( v /p, 2p). Let ^ be its complement, and let if) (J) be the indicator of this complement 
(i.e., = 1 if rij < qn else ip(j) = 0). Let c be a positive scalar depending on p to be 
determined later. Observe that 

P(\\A- A\\ 2 F > cN) = P(\\A-A\\ 2 F > cN A *) + P(\\A-A\\ 2 F > cN A *) 

< > A + P(#). (4) 

We now analyze the two terms separately. For the first term we use a technique attributed 
to S. Bernstein (see Hoeffding [I2])- Let be the indicator function of nonnegative reals, 
i.e., 4>(x) = 1 for x > while 0(x) = for x < 0. Then, in general, P(u > 0) = E(<fi(u)). 
Thus, 

P(||i4 - > ciV A f) = P(||A-i||^-ciV > A ^(m) = 1 A •■• A ^(njv) = 1) 

= E(<f>(\\A - Af F - cN) ■ ^(m) • • -^(n*)). 

Let /ibea positive scalar depending on p to be determined later. Observe that for any such 
h and for all x G R, <fi(x) < exp(hx). Thus, 

P(\\A — A\\p > cN A < E(exp(h\\A- A\\ 2 F - hcN) ■ ^(m) ■ ■ -ip(n N )) 

N 



E (exp [hj^ - A(:,j)\\% - c) j • $(n,) ■ ■ ■ ^(n K ) 

g (°p( h g( (r^^) - e ))-*w-^ 



where 



(5) 

\ \ \ ( I — II ]~ [ II — II : > II I 

3=1 

fi---f N , (6) 



To obtain (jSJ), we used the independence of the n/s. Let us now analyze fj in isolation. 

h = E°p(fc( (1 irt$!.,- ) -'))fe)^ = o 
= E e *p(''( (1 i'rtffi-o -°))^= i ) 



i=0 



L<?™J / / /• x 2 



< ^expf/^f- 

i=o \ w 



— p) 2 (n — y^n) 
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To derive the last line, we used the fact that i < y/pn since i < gn. Now let us reorganize 
this summation by considering first i such that \i—pn\ < y/n, and next i such that \i — pn\ G 
[y/n, 2y / n), etc. Notice that, since i < qn < 2pn, we need consider intervals only until \i— pn\ 
reaches pn. 

f> * E E °p(> L!!,rLr '))^"i 

fc=0 i:\i—pn\e[k\/n,(k+l)\/n) 

* E E ■>IM ,-^ 1 ^ -°))^=o 

k=0 i;\i-pn\£[kyjn,(k+l)y/n) 
\j>y/n\ 

= E E ex p(M n - v rwi-.^ - c )) p ^ = /) 

fc=0 i:|i-pn|e[fcv^i,(fe+l)-^/n) 
(fc + 

p) 2 (l - ^/p) 



where, for the last line, we have applied (j2J). The theorem is valid since k < py/n. 
Continuing this derivation and overestimating the finite sum with an infinite sum, 

n < 2exp(- fe c) ■ f> P ( (l _^ - *■/,) 



2eKp[ :i-mi-Vp)- hc 



+ 2 exp(— /ic) • exp 



fe=i 



L(l-p) 2 (l- y/p) 



Choose /i so that h/((l-p) 2 )(l - y/p) < l/(8p), i.e., fc< (1 -p) 2 (l - y/p)/(Sp). Then the 
second term in the square-bracket expression at least twice the first term for all k > 1, hence 



/ h \ °° 

^ " \ (1 -P) 2 (l - y/P) ' +2eX ^~ hC ^ • E^P (- fc V(2p)) • 



(7) 



Observe that Y^=i exp(— k 2 /(2p)) is dominated by a geometric series and hence is a finite 
number depending on p. Thus, once h is selected, it is possible to choose c sufficiently large 
so that each of the two terms in (jTJ) is at most 1/3. Thus, with appropriate choices of h and 
c, we conclude that fj < 2/3. Thus, substituting this into (jSJ) shows that 



P(\\A-A\\ 2 F >cN A y)< (2/3) 



AT 



We now turn to the second term in (j4j). For a particular j, the probability that rij > qn 
is bounded using (JTJ) by i>" where v p = (e <5 /(l + 5)^ 1+<5 ^) p , where 5 = q/p — 1, i.e., 5 = 

6 



mm(p, y/p — p). Then the union bound asserts that the probability that any j satisfies 
rij > qn is at most Nv™. Thus, 

P(\\A - > cN) < (2/3)" + NVp. 

This concludes the proof. 



3 Maximum Clique 

Let G = (V, E) be a simple graph. The maximum clique problem focuses on finding the 
largest clique of graph G, i.e., the largest complete subgraph of G. For any clique K of G, 
the adjacency matrix of the graph K' obtained by taking the union of K and the set of loops 
for each v G V(K) is a rank-one matrix with l's in the entries indexed by V(K) x V(K) 
and O's everywhere else. Therefore, a clique K of G containing n vertices can be found by 
solving the rank minimization problem 

min rank (X) 

s.t. J^X^n 2 , (9) 

X tJ = X(i,j)$Efw.di^j, (10) 
Xe[o,i] VxV . (li) 

Unfortunately, this rank minimization problem is also NP-hard. We consider the relaxation 
obtained by replacing the objective function with the nuclear norm, the sum of the singular 
values of the matrix: ||X||* = (?i(X) + • • • + crjv(X). 

Underestimating rank (X) with we obtain the following convex optimization prob- 

lem: 

min ||X||* 

s-t- E^E^^'^ 2 . (12) 

X i:j = if <£ E and i^j. 

Notice that the relaxation has dropped the constraint X^ < 1 that was present in the original 
formulation. This constraint turns out to be superfluous (and, in fact, unhelpful — see the 
remark following (I2"0"j) ) for our approach. Using the Karush-Kuhn- Tucker conditions, we 
derive conditions for which the adjacency matrix of a graph comprising a clique of G of size 
n together with n loops for each vertex in the clique is optimal for this convex relaxation. 

3.1 Optimality Conditions 

In this section, we prove a theorem that gives sufficient conditions for optimality and unique- 
ness of a solution to (fT2l . These conditions involve multipliers Ajj and /x and a matrix W. 
In subsequent subsections we explain how to select A^-, p, and W based on the underlying 
graph to satisfy the conditions. 
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Recall that if / : R n — > R is a convex function, then a subgradient of / at a point x is 
defined to be a vector g G R n such that for all y G R n , /(y) — /(x) > g T (y — x). It is a 
well-known theorem that for a convex / and for every x G R n , the set of subgradients forms 
a nonempty closed convex set. This set of subgradients, called the subdifferential, is denoted 
as 0/(x). 

In this section we consider the following generalization of ffl2|) because it will also arise 
in our discussion of biclique below: 

min ||X||* 

s -t- E£i ££=1 -X'ij > ran, (13) 
X^Ofor 

Here, X G K MxN , E is a subset of {1, . . . , M} x {1, . . . , X}, and the complement of E is 
denoted E. 

The following lemma characterizes the subdifferential of || • ||* (see jH Equation 3.4] and 
also [18]). 

Lemma 3.1 Suppose A G R, mxn fi as rank r with singular value decomposition A = Y7k=i °"fc u *: v fc ■ 
Then cf) is a subgradient of || • ||* ai A i/ and on/?/ if <ft is of the form 



k=l 

where W satisfies \\W\\ < 1 such that the column space of W is orthogonal to and the 
row space of W is orthogonal to v& for all k = 1, 2, . . . , r. 

Let / be a subset of {1, . . . , N}. We say that u G R w is the characteristic vector of 7 if 
Uj = 1 for i G / while u% = for i G {1, . . . , N} — I. 

Let U* be a subset of {1, . . . , M} and V* a subset of {1, ... , N}, and let u, v be their 
characteristic vectors respectively. Suppose \U*\ = m and |V*| = n with m > 0, n > 0. Let 
X* = uv T , an M x X matrix. Clearly X* has rank 1. Note that Lemma [3.11 implies that 

d\\ ■ \UX*) = {uv T /v / ™^+ W : Wv = 0, H/ T u = 0, ||W|| < 1}. (14) 

This leads to the main theorem for this section. 

Theorem 3.1 Let U* be a subset of {1, ... , M} of cardinality m, and let V* be a subset of 
{1, . . . , N} of cardinality n. Let u and v be the characteristic vectors ofU*, V* respectively. 
Let X* = uv T . Suppose X* is feasible for ffjgj) . Suppose also that there exist W G H MxN , 
A G R MxN and \i G R+ such that Wv = 0, u T W = 0, \\W\\ < 1 and 

T 

UV TTT rp S. \ 



+ W = /jee T + > KjeteJ. (15) 
y/mn ^— ' J 

Here, e denotes the vector of all 1 's while ej denotes the ith column of the identity matrix 
(either in R M or R ). Then X* is an optimal solution to (\13\) . Moreover, for any I C 
{1, . . . , M} and J C {1, . . . , N} such that I x J C E, \I\ ■ \J\ < mn. 

Furthermore, if ||W|| < 1 and /x > ; iaen X* zs tae unique optimizer of ( TJgj) ^and hence 
will be found if a solver is applied to (TTgj) j. 
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Proof: The fact that X* is optimal is a straightforward application of the well-known KKT 
conditions. Nonetheless, we now explicitly prove optimality because the inequalities in the 
proof are useful for the uniqueness proof below. 

Suppose X is another matrix feasible for ffT3l) . We wish to show that ||X||* > ||X*||*. To 
prove this, we use the definition of subgradient followed by (ITBT) . The notation A • B is used 
to denote the elementwise inner product of two matrices A, B. 

||X||*- ||X*||* > (uv T /V^m + W)»(X - X*) (16) 
= /i(ee T ) . (X - X*) + A y -(e 4 eJ).(X-X*) (17) 

= //((ee>X-mn) (18) 
> 0. (19) 

Equation (fT6|) follows by the definition of subgradient and (TH|) ; (fT7|) follows from ({15]) ; and 
( TTBl follows from the fact that (ee T ) «X* = mn by definition of X* and (e^ej) «X = (e,ej) • 
X* = for G -E by feasibility. Finally, f|T9|) follows since u > and (ee T ) • X > mn by 
feasibility. This proves that X* is an optimal solution to (|T3|) . 

Now consider (/, J) such that IxJcE. Then X' = u'(v') -mn/d/l • | J|), where u' is the 
characteristic vector of / and v' is the characteristic vector of J, is also a feasible solution to 
( |T3l) . Recall that for a matrix of the form uv T , the unique nonzero singular value (and hence 
the nuclear norm) equals || u || • || v || . Thus, ||X'||* = mn/(\I\ ■ \ J\) 1 ^ 2 and IIX*!^ = ^/mn. Since 
X* is optimal, ||X'||* > ||X*||, i.e., y/mn < mn/(\I\ ■ \ J\) 1 ^ 2 - Simplifying yields |/| • |J| <mn. 

Now finally we turn to the uniqueness of X*, which is the most complicated part of the 
proof. This argument requires a preliminary claim. Let Si denote the subspace of M x X 
matrices Z\ such that u T Zi = and Z\V = 0. Let S2 denote the subspace of Mx X matrices 
that can be written in the form xv T , where x G R M has all zeros in positions indexed by 
U*. Let denote the subspace of M x X matrices that can be written in the form uy T , 
where y G H N has all zeros in positions indexed by V*. Let S4 denote the subspace of all 
M x X matrices that can be written in the form uy T + xv T , where x has nonzeros only in 
positions indexed by U* , y has nonzeros only in positions indexed by V*, and the sum of 
entries of uy T + xv T is zero. Finally, let S5 be the subspace of M x X matrices of the form 
«uv T , where a is a scalar. 

The preliminary claim is that Si, . . . , S 5 are mutually orthogonal and that Si © • • • © S5 = 
R Mx7V . To check orthogonality, we proceed case by case. For example, if Z\ G Si and 
Z 2 G 5*2, then Z 2 = xv T so Z\ • Z 2 = Zi • (xv T ) = x^^v = since ZiV = 0. The identity 
Z • (xy T ) = x T Zy similarly shows that Zi is orthogonal to all of S 2 , ■ ■ ■ , S5. Next, observe 
that Z 2 G S2 has nonzero entries only in positions indexed by U* x V*, where V* denotes 
{1, . . . , X} — V*. Similarly, Z% G S3 has nonzero entries only in positions indexed by U* x V*, 
and Z4 G £4 and Z 5 G S5 have nonzero entries only in positions indexed by U* x V*. Thus, 
the nonzero entries of 5*2, S3 and S4 © S 5 are disjoint, and hence these spaces are mutually 
orthogonal. The only remaining case is to show that S4 and S5 are orthogonal; this follows 
because a matrix in S5 is a multiple of the all l's matrix in positions indexed by U* x V*, 
while the entries of a matrix in 64, also only in positions indexed by U* x V*, sum to 0. 

Now we must show that S 1 ®---®S S = R MxN . Select a Z G R MxN . We first split off 
an S 5 component: let a = u T Zv/((u T u)(v T v)) and define Z 5 = auv T . Then Z 5 G S5. Let 
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Z = Z — Z 5 . One checks from the definition of a that u T Zv = 0. It remains to write Z as 
a matrix in Si © • • • © £4. 

Next we split off an Si component. Let x = Zv/v T v and y = Z T u/u T u. Observe that 
u T x = u r Zv/v T v = 0. Similarly, v T y = 0. Let Z = xv T + uy T and Z\ = Z — Z . Then 

Z\\ = Z\ — Zv 

= Z\ — xv T v — uy T v 
= Zv — xv T v 

= 0, 

where the third line follows because v T y = and the fourth by definition of x. Similarly, 
Zfu = 0. Thus, Z 1 eS y 

It remains to split Z among S 2 , 5*3 and 5*4. Write x = xi + X2, where xi is nonzero only 
in entries indexed by U* while X2 is nonzero only in entries indexed by U*. Similarly, split 
y = yi + y2 using V* and V*. Then Z = XiV T + X2V T + uyf + uy 2 . Then X2V 71 G S 2 and 
uy 2 G ^3, so define Z2 = X2V T and Z% = uy^. Finally, we must consider the remaining term 
Z4 = Z — X2V T — uy^ = Xiv r + uyf . This has the form required for membership in S4, but 
it remains to verify that the sum of entries of Z4 add to zero. This is shown as follows: 

Z 4 • (ee T ) = Z 4 • (uv T ) 

-Try - 

= U Z4V 

= (u r Xl )(v T v) + (u r u)(yfv) 

= (u r x)(v r v) + (u y u)(y y v) 

= + 0. 



The second line follows because Z 4 is all zeros outside entries indexed by U* x V*. The 
fourth line follows because u is zero outside U* and similarly for v. The last line follows 
from equalities derived in the previous paragraph. 

This concludes the proof of the claim that Si, ■ ■ ■ ,5*5 split R, MxJV into mutually orthog- 
onal subspaces. 

Now we prove the uniqueness of X* under the assumption that [i > and < 1. Let 
X be a feasible solution different from X*. Write X — X* = Zi + • • • + Z§, where Zi, . . . ,Z$ 
lie in Si, ... , S$ respectively. Now we consider several cases. 

The first case is that Z x ^ 0. Then since \\W\\ < 1 and Zi\ = 0, Zfu = 0, it follows 
from Lemma EH] that W + eZi lies in <9|| • for e > sufficiently small. This means that 

W appearing in HTB]) above may be replaced by W + tZ\ without harming the validity of 
the inequality. This adds the term eZ\ • (X — X*) to the right-hand sides of the inequalities 

following (USD- Observe that Z x • (X - X*) = Z x • {Z x H h Z s ) = Z x • Z x > 0. Thus, a 

positive quantity is added to all these right-hand sides, so we conclude \\X\\* — > 0. 

For the remaining cases, we assume Z x = 0. We claim that Z 2 = Z 3 = as well. For 
example, suppose Z 2 = xv T . Recall that Z 2 is nonzero only for entries indexed by U* x V* 
(and in particular, x must be zero on U*). Since all of Z3, Z 4 and Z5 are zero in U* x V*, 
Z2{i,j) = Xy — X*j for (i, j) G U* X V*. Select an i G U*; we claim that there exists a 
j G V* such that E. If not, then (U* U {i}) x V* would define a solution to ([13]) 
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with greater cardinality (and hence lower objective value) than U* x V*, but we have already 
proven that U* x V* defines the optimal solution. Thus, there is a constraint in (I13p of the 
form X it j = that must be satisfied by both X and X*. This means that the entry of 
Z2 is zero. On the other hand, this entry Thus, we conclude Xi = 0. Therefore, 

x = so Z2 vanishes. The same argument shows Z3 vanishes. 

The last case is thus that Z 2 and Z 3 are all zero, so at least one of Z 4 or Z§ must 
be nonzero. Since the sum of entries of Z 4 is zero and X is feasible (and, in particular, 
feasible for the constraint X • (ee T ) > mn), it follows that the sum of entries of Z 5 must 
be nonnegative, i.e., Z 5 = «uv T with a > 0. If a > then we are finished with the 
proof: the assumption // > and a > imply that both factors in (1181) are positive, hence 
||X||* — 11-^"* ||* > 0. 

Thus, we may assume that Z^ = so Z4 7^ 0. Recall that Z4 is nonzero only in positions 
indexed by U* xV*. We can now draw the following conclusions about the singular values of 
X versus those of X*. Recall that the rank of X* is one, and its sole nonzero singular value 
is y/mn and hence ||X*||p = ||X*|| = ||X*||* = y/mn. Observe that the sum of entries of X, 
namely, u T Xv, is also mn. But u T lv < ||u|| ■ ||X|| • ||v|| = ||X|| y/mn. Thus, ||X|| > y/mn, 
i.e., <y\(X) > <ji(X*), where <Jk{A) is notation for the kth singular value of matrix A. 

Next, note that ||-X"||i? > ||X*||f for the following reason. Recall that the Frobenius norm 
is equivalent to the Euclidean vector norm applied to the matrix when regarded as a vector. 
Furthermore, when regarded as a vector, X is the sum of two orthogonal components, namely 
X* and Z4. Therefore, by the Pythagorean theorem, ||X|| F = (\\X*\\ 2 F + \\Z 4 \\ 2 ) 1 . bmce 
Z 4 ^ 0, \\X\\ F > \\X*\\ F . 

Thus, we know that ai(X) > (J\{X*) and that (Xi(X) 2 + a 2 {X) 2 > a^X*) 2 . These two 
inequalities imply that <J\{X) + cr 2 (X) > ax(X*), and therefore ||X||* > ||X*||*. 

Thus, we have shown that in all cases, if ||W|| < 1, /i > and X is a feasible point 
distinct from X*, then ||X||* > ||X*||*. This proves that X* is the unique optimizer. 

This theorem immediately specializes to the following theorem if we take the case that 
G is an X-node undirected graph, that M = N, m = n, and E = E{G) U : % G V(G)}. 

Theorem 3.2 Let V* be the nodes of an n-node clique contained in an N-node undirected 
graph G = (V,E). Let v G R y be the characteristic vector of V* . Let X* = vv T . (Clearly 
X* is feasible for (171) ] . Suppose also that there exist W G R VxV , A G R VxV and fi G R+ 
such that Wv = 0, v T W = 0, \\W\\ < 1 and 

T 

— + W = (iee T + X ij e i e J- (2°) 

Then X* is an optimal solution to ( 171) . Moreover, V* is a maximum clique of G. Further- 
more, if \\W\\ < 1 and fi> 0, then X* is the unique optimizer of and V* is the unique 
maximum clique of G. 

Remark: It may appear that we need to know the value of n prior to applying the theorem 
since n is present in the statement of (fl2l) . In fact, this is not the case: we observe that 
the factor n 2 appearing in (1T21) is the sole inhomogeneity in the problem. This means that 
we obtain the same solution, rescaled in the appropriate way, if we replace n 2 by 1 in (1T2]) . 
Thus, n does not need to be known in advance to apply this theorem. 
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For the next two subsections, we consider two scenarios for constructing G and try to 
find X*, W and values for the multipliers to satisfy the conditions of the previous theorem. 
For both subsections, we use the following choices. We take [l = 1/n where n = \V*\. We 
define W and A by considering the following cases: 

(lui) If G V* x V*, we choose Wy = and Ay = 0. In this case, the entries on other 
side of ff20l) corresponding to this case become l/n + = l/n + 0. 

(uo 2 ) If G E— (V* x V*) such that i ^ j, then we choose Wy = 1/n and Ay = 0. Then 
the two sides of (120]) become + 1/n = l/n + 0. 

(co> 3 ) If i ^ V*, we set Wm = 1/n. Again the two sides of (jSOJ) become + 1/n = 1/n + 0. 

(co> 4 ) If E, i ^ V*, j V*, then we choose Wy = —j/n and Ay = -(1 + 7)/n for 

some constant 7 G R. The two sides of ( 1201) become — 7/n = 1/n — (1 + 7)/n. The 
value of 7 is specified below. 

(0*5) If E, i eV*, j ^V*, then we choose 

Wy = -r-^ r, Ajj — 



n(n 



— pj) ' n n(n — p^) 



where pj is equal to the number of edges in E from j to V^*. 
(co> 6 ) If ^ E, i ^ V*, j G V* then choose Wy, Ay symmetrically with the previous case. 

First, observe that = 0. Indeed, for entries i G V*, W(i, :)v = since W(i, V*) = 
for such entries. For entries i G V — V*, 

1 Pi 

W(i, :)v = ^ (n - pi)— — - — - = 

n n[n — Pi) 

by our special choice of W(i,j) in cases 5 and 6. 

It remains to determine which graphs G yield W as defined by (lui)-(luq) such that 
\\W\\ < 1. We present two different analyses. 



3.2 The Adversarial Case 

Suppose that the edge set of the graph G = (V, E) is generated as follows. We first add a 
clique Ky with vertex set V* of size n. Then, an adversary is allowed to add a number of 
the remaining |V|(|V| — l)/2 — n{n — l)/2 potential edges to the graph. We will show that, 
under certain conditions, our adversary can add up to 0(n 2 ) edges to the graph and Ky will 
still be the unique maximum clique of G. 

We first introduce the following notation. Let W D G H VxV denote the matrix with 
diagonal entries equal to the diagonal entries of W and all other entries equal to 0. Let 
W ND be the matrix whose nondiagonal entries are equal to the corresponding nondiagonal 
entries of W and whose diagonal entries are equal to 0. So W = W D + W ND . 
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Now suppose G = (V, E) contains a clique K v * of size n with vertices indexed by V* G 
R y . Moreover, suppose that G contains at most r edges not in Ky* and each vertex in 
V — V* is adjacent to at most 5n vertices in V* for some 5 G (0, 1). Consider W as defined 
by (uji)-(uq) with 7 = 0. By the triangle inequality, 

\\W\\ 2 < (\\W D \\ + HW^II) 2 < 2(\\W D \\ 2 + \\W ND \\ 2 ) = 2{l/n 2 + \\W ND \\ 2 ) 

since HW-^H = 1/n. Applying the bound || < HW^H^, it suffices to determine which values 
of r yield 

IIW^HJ. = 2\\W(V*, V - V*)\\ 2 F + \\W ND (V -V\V - V*)f F < (n 2 - 2)/(2n 2 ) 

since, by the symmetry of W, 

W ND (V*, V-V*)= W(V\ V-V*) = W(V - V\ V*). 

The diagonal entries of W ND (V — V*, V — V*) are equal to and at most 2r of the remaining 
entries are equal to 1/n. Therefore, 

\\W ND (V -V*,V -V*)\\ 2 F < 2r/n 2 . 

Moreover, since n — pj > (1 — 8)n, 

1 . / s P 2 i 



\\W(V*,V-V)\\' F = J2 (Pr^ + (n-pj 

;ci/_v» V 



jev-v* 
jev-v 



{n — pj) 2, n/ 



n 2 (n — Pj)n 

< V ( Pj 1 5nPi 

~ ^ In 2 (l-5)n 3 



E -2 



< 



1-5 ^ n 2 

jev-v* 

1 \ r 
1-5) n 2 ' 



Thus, the optimality and uniqueness conditions given by Theorem 13.11 are satisfied by X* if 

1-5 



1 + — ^— : )r <{n 2 - 2)/4. 



Equivalent ly, 

l — 5, 

Therefore, G can contain up to 0(n 2 ) edges other than those in V* x V*, and yet V* will 
remain the unique maximum clique of G. 

Note that these bounds are the best possible up to the constant factors. In particular, if 
the adversary were able to insert (n + l)(n + 2)/2 edges, then a new clique could be created 
larger than the planted clique. Thus, the adversary must be limited to const • n 2 edges for 
const < 1/2. Similarly, if the adversary could join a nonclique vertex to n clique vertices, 
then the adversary would have enlarged the clique. Thus, the restriction that a nonclique 
vertex is adjacent to at most const • n clique vertices is the best possible. 
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3.3 The Randomized Case 



Let V be a set of vertices with |V| = N and consider a subset V* C V such that |V*| = n. 
We construct the edge set E of the graph G = (V, E) as follows: 

(r x ) For all (iJ)eV*xV*,(i,j)EE. 

(r 2 ) Each of the remaining N(N — l)/2 — n(n — l)/2 possible edges is added to E indepen- 
dently at random with probability p6 [0, 1) . 

Notice that, by our construction of E, G contains a clique of size n with vertices indexed by 
V*. We wish to determine which n, N yield G as constructed by (Ti) and (T2) such that 
with high probability X* = vv T is optimal for the convex relaxation of the clique problem 
given by (Fl2l) . The following theorem states the desired result. 

Theorem 3.3 There exists an a > depending on p such that for all G constructed via 
(r 2 ) with n > ayN, the clique defined by V* x V* is the unique maximum clique of G 
and will correspond to the unique solution of with probability tending exponentially to 1 
as N —>■ 00. 

Proof: Consider the matrix W constructed as in (co>i)-(c<j 6 ) with 7 = —p/(l — p)- By 
Theorem I3.2[ X* is the unique optimum if 



We first show that ||W|| < 1 with probability tending exponentially to 1 as N — >• 00 in the 
case that n = n(y/N). We write W = Wi + W 2 + W 3 + W 4 + W s , where each of the five 
terms is defined as follows. 

We first define W%. For cases (u^) and (^4), choose Wi(i,j) = W(i,j). For cases 
and (uq), take Wi(i,j) = — p/((l —p)n). For case (1^1), choose Wi(i,j) randomly such that 
Wi(i,j) is equal to 1/n with probability p and equal to — p/((l —p)n) otherwise. Similarly, 
in case (0*3), take Wi(i,i) to be equal to 1/n with probability p and equal to — p/((l — p)n) 
otherwise. By construction, each entry of W\ is an independent random variable with the 
distribution 



Next, W2 is the correction matrix to W\ in case (uji). That is, W2(i,j) is chosen such 
that 



W\\ < 1 and pj < n for all j eV - V* 





(21) 



with probability at least to 1 — exp(ciX 1 / 6 ) for some constant c\ > 0. 



W 2 (i,j) + W 1 {i,j) = W(i,j) = 
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for all G V* x V* and is zero everywhere else. As before, applying Lemma [2.11 shows 
that 

p \ 1 



\W 2 \\<3[-^—\ -= (22) 



with probability at least 1 — exp(cin 1//6 ). Similarly, W 3 is the correction to W 3 in case (u; 3 ), 
that is 

W 3 (i,i) = W(i,i)-W 1 (i,i) 

for all i G y — V* and all other entries are equal 0. Therefore, W 3 is a diagonal matrix with 
diagonal entries bounded by 2/n. It follows that 

l|W 3 || < (23) 
n 

Finally, W\ and W5 are the corrections for cases (u^) and (uq) respectively. These are exactly 
of the form (A — A)/n as in Theorem 12.41 in which N in the theorem stands for iV — n in 
the present context. Examining each term of (J3j) shows that in the case n = f^iV 1 / 2 ), the 
probability on the right-hand side is the form 1 — cexp(— kN C2 ). It follows that there exists 
constant «4 > such that 

II ^4 1| 2 < \\W 4 \\ 2 F < alNn~ 2 

with probability tending exponentially to 1 as iV — > oo. Moreover, since Condition F is 
satisfied in this case, pj < n for all j G V — V*. Notice that, by symmetry, W4 = W§ . Thus, 
since each of W\, W 2) ■ ■ ■ , W 5 is bounded by an arbitrarily small constant if n = £l(v~N), 
there exists constant a > such that \\W\\ < 1 with probability tending exponentially to 1 
as iV — » 00 as required. 



4 Maximum Edge Biclique 

Consider a bipartite graph G = ((£/, V), E) where \U\ = M, \V\ = N. The adjacency matrix 
of a biclique H of G is rank-one matrix X G R MxiV . This matrix X has the property that 
Xij = for all i G U, j G V such that (i, j) ^ E. It follows that a biclique of G of size mn 
can be found (if one exists) by solving the rank minimization problem 

min rank (X) 

s.t. ^^Xij >mn, (24) 
ieu jev 

X tj = V(i,j)e(UxV)-E, (25) 

Xe[0,l} UxV . (26) 

A rank-one solution X* to this problem corresponds to the adjacency matrix of a biclique of G 
containing at least mn edges. As with the maximum clique problem, this rank minimization 
problem is still NP-hard. As before, we underestimate rank (X) with We obtain the 

following convex optimization problem: 

min \\X\\* 

s-t- Eiev E j6 y > mn, (27) 
X^ = if E. 
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Using the Karush-Kuhn- Tucker conditions, we derive conditions for which the adjacency 
matrix of a graph comprising a biclique of G is optimal for this relaxation. Indeed, the 
following is an immediate consequence (essentially a restatement) of Theorem 13.11 

Theorem 4.1 Let U* x V* be the vertex set of a biclique in G in which \U*\ = m and 
\V*\ = n. Let u G R M be the characteristic vector ofU* , and let v G H N be the characteristic 
vector of V* . Let X* = uv T . (Clearly X* is feasible for (ff^J. Let E = E(G) and let E 
be its complement. Suppose also that there exist W G H MxN , A G R A/xAr and fi G R + such 
that Ww = 0, u T W = 0, \\W\\ < 1 and 



T 

uv 



+ W = fiee T + x H e i e J- ( 28 ) 



'ran 



Then X* is an optimal solution to (27). Moreover, G does not contain any biclique with 
more than mn edges. Furthermore, if \\W\\ < 1 and fi > 0, then X* is the unique optimizer 
of <\27\j and U* x V* is the unique optimal biclique. 



In the next two subsections, we consider two scenarios for how to construct a bipartite 
graph G and biclique that satisfy the conditions of the theorem. 

In both scenarios, we will take fi = 1/ y/mn and consider W and A defined according to 
the following cases. 

(tpi) For G U* X V*, taking = and \j = ensures the r/'-entries of both sides of 
( |28l) are equal to 1/ y/mn. 



(ip 2 ) For (i,j) G E — (U* x V*), we take Wij = 1 /y/mn and Ay = 0. Again, the r/'-entries 
of both sides of ( 1281) are equal to 1/ y/mn. 



(ips) For $l E such that i U* and j ^ V* , we select = — ^ / y/mn and A^- = 

— (1 + 7) /y/mn where 7 will be defined below. In this case, the zj-entries of each side 
of d2H]) are 0. 

(■04) For E such that i ^ U* and j G V*, we choose 

= — — and Xij = —^= I — — 

(n — pi)yjmn y/mn \n — Pi 

where pi is equal to the number of edges with left endpoint equal to i and right endpoint 
in V*. Note that if n = pi then % is connected to every vertex of V* and thus the KKT 
condition cannot possibly be satisfied. If Pi < n, both sides of (|28|) are equal to 
-Pi/((n -Pi) y/mn). 

(05) For ^ E such that i G U* and j V*, we choose 

W i3 = —. q \ - and - 1 ( ~ qj 



(m 



q/)y/mn ' y/mn \m — qj 



where qj is equal to the number of edges with right endpoint equal to j and left endpoint 
in U*. As before, this is appropriate only if qj < m. 
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We next check that W satisfies the requirements for to be a subgradient of uv T : 
Wv = 0, W T u = 0, and ||W|| < 1. To show that Wv = 0, choose row % of W and consider 
W(i, :)v = J2 je v* W ^r Kiel/* then W„ = for all j G V*, so W(i, :)v = 0. In the case 
i U*, consider each j G V*. If G E then, by Case 2, W 7 ^- = l/y/mn. There are Pi 
such entries, with sum pi/y/mn. If 4- then = —pi/((n — pij^/mn). There are 
n — pi such entries, with sum —pijsjmn. It follows that W(i, :)v = as required. 

The proof that W^ T u = follows is symmetric. It remains to determine which bipartite 
graphs G yield W as defined above such that ||W|| < 1. As in the maximum clique case, we 
present two different analyses. 



4.1 The Adversarial Case 

Suppose that the edge set of the bipartite graph G = ((£/, V),E) is generated as follows. 
We first add a biclique U* x V* with \U*\ = m, \V*\ = n. Then, as in the adversarial case 
for the maximum clique problem, an adversary is allowed to add a number of the remaining 
|f ||V| — mn potential edges to the graph. We will show that, under certain conditions, our 
adversary can add up to 0{mn) edges to the graph and U* x V* will still be a maximum 
edge biclique of G. 

We make the following assumptions on the structure of G: 

1. G contains at most r edges aside from those of the optimal biclique. 

2. Each vertex of V — V* is adjacent to at most am vertices of U* for some a G (0, 1). 

3. Each vertex of U — U* is adjacent to at most f3n vertices of V* for some (3 G (0, 1). 

Consider W as defined by (ipi)-(i^>5) with 7 = 0. As before, we use the bound ||W|| < 
Notice that at most r entries of W(U — U*,V — V*) are equal to l/y/mn and the remainder 
are equal to 0. Therefore, 

\\W(U-U*,V-V*)\\l < — . 

mn 

Moreover, for each j G V — V* , qj < am. It follows that 

\\W(U*,V-V*)\\ 2 F = V f^ + (m- qv )- > !Z 



vev-v* 



mn mn(m — q v ) 2 



qv i ll qv 



^ mn \ m — q v 

^ mn v 1 - « 

V t <- 

mn(l — a) mn(l — a) 

v&V* 



Similarly, 

\\W(U -U*,V*)\\ 2 F < 



[1 — (3)mn 
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Therefore, \\W\\ < 1 if 



r 1 + 



1-a 1-13 



1 1 



) 



< mn. 



Thus, the graph can contain up to 0(mn) diversionary edges, yet the optimality and unique- 
ness conditions given by Theorem 14.11 are still satisfied. This result is the best possible up 
to constants for the same reasons explained at the end of Section 13.21 

4.2 The Random Case 

Let y, z be fixed positive scalars. Let U,V be two disjoint vertex sets with \V\ = N and 
\U\ = \yN]. Consider U* C U and V* C V such that \V*\ = n and \U*\ = m = \zn\. 
Suppose the edges of the bipartite graph G — ((U,V), E) are determined as follows: 

(Pi) For all (i,j)eU*xV*,(i,j)eE. 

(/3 2 ) For each of the remaining potential edges E U x V, we add edge to E with 
probability p (independently). 

Notice G contains the biclique (U*, V*). As in the maximum clique problem, if n = Q(y/~N) 
and G is constructed as in (fli), (fa) then U* x V* is optimal for the convex problem (|27|) . 
We have the following theorem. 

Theorem 4.2 There exists a > depending on p, y, z such that for each bipartite graph 
G constructed via (Pi), (P2) with n > ay/N the biclique defined by U* x V* is maximum 
edge biclique of G with probability tending exponentially to 1 as N —>■ 00 and is found as the 
unique solution to the convex relaxation <\27\j . 

Let W be constructed as in (ipi)-(il>5) with 7 = — p/(l — p). Then X* = uv T is the 
unique optimal solution of (j27l) if 



where each of the summands is defined as follows. We first define W\. If (i, j) E U*xV*, then 
wesetWi(i,j) = \j \J mn with probability p and equal to 7/V mn with probability ( 1 —p) . For 
(i,j) E (U xV) — (U* x V*), we set Wi(i,j) = 1/^/mn if (i,j) E E and set Wi(i,j) = j/y/mn 
otherwise. In order to bound ||Wi||, we will use the following Theorem 12.21 to conclude that 



W\\ < 1, qj < \zn\ V j E V - V* , and Pj < nW j E U - U* . 
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Again, by Theorem 12.21 we conclude that 



||W 2 || < 

Vn 

with probability at least 1 — d x exp(— c' 2 n c ' 3 ) for some c[, c' 2 , c' 3 > 0. 

It remains to derive bounds for \\W 3 \\ and || W4II . Notice that the construction of W(U*, V— 
V*) and W{U — U*, V*) is identical to that in Case (u^) for the maximum clique problem. 
Thus, we can again apply Theorem 12.41 first to W3 (in which case (n, N) in the theorem 
stand for (\zn\,N — n)) and second to Wj (in which case (n,N) in the theorem stand 
for (n, \yN~\ — \zn\) to conclude that 1 1 PV3 1 1 and 1 1 M/4 1 1 are both strictly bounded above by 
constants provided n = Q(yN) with probability tending to 1 exponentially fast. Moreover, 
as before, Condition F is satisfied in this case and thus qj < \zn\ for all j G V — V* and 
Pj < n for all j G U — U* as required. ■ 

5 Conclusions 

We have shown that the maximum clique and maximum biclique problems can be solved 
in polynomial time using nuclear norm minimization, a technique recently proposed in the 
compressive sensing literature, provided that the input graph consists of a single clique or 
biclique plus diversionary edges. The spectral technique used by Alon et al. |2J for the 
planted clique problem has been extended to other problems; see, e.g., McSherry [TJJ. It 
would be interesting to extend the nuclear norm approach to other NP-hard problems as 
well. 
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